Warming up for NUTS (No U-Turn)

I have been thinking about writing a blog on why the no u-turn sampler (NUTS) works rather than describing the actual algorithm. This led me to look at Jared Tobin’s Haskell implementation. His example tries to explore the Himmelblau function but only finds one local minima. This is not unexpected; as the excellent Stan manual notes

Being able to carry out such invariant inferences in practice is an altogether different matter. It is almost always intractable to find even a single posterior mode, much less balance the exploration of the neighborhoods of multiple local maxima according to the probability masses.

and

For HMC and NUTS, the problem is that the sampler gets stuck in one of the two "bowls" around the modes and cannot gather enough energy from random momentum assignment to move from one mode to another.

rm(list = ls(all.names=TRUE))
unlink(".RData")

rstan::stan_version()
## [1] "2.12.0"
rstan_options(auto_write = TRUE)

On the Rosenbrock function it fares much better.

knitr::include_graphics("RosenbrockA.png")

plot of chunk unnamed-chunk-2

We can’t (at least I don’t know how to) try Stan out on Rosenbrock as its not a distribution but we can try it out on another nasty problem: the funnel. Some of this is taken directly from the Stan manual.

Here’s the Stan:

parameters {
  real y;
  vector[9] x;
}
model {
  y ~ normal(0,3);
  x ~ normal(0,exp(y/2));
}

which we can run with the following R:

funnel_fit <- stan(file='funnel.stan', cores=4, iter=10000)
## Warning: There were 92 divergent transitions after warmup. Increasing adapt_delta above 0.8 may help. See
## http://mc-stan.org/misc/warnings.html#divergent-transitions-after-warmup
## Warning: Examine the pairs() plot to diagnose sampling problems
funnel_samples <- extract(funnel_fit,permuted=TRUE,inc_warmup=FALSE);
funnel_df <- data.frame(x1=funnel_samples$x[,1],y=funnel_samples$y[])

Plotting the data requires some unpleasantness but shows the neck of the funnel does not get explored. So even HMC and NUTS do not perform well.

midpoints <- function(x, dp=2){
    lower <- as.numeric(gsub(",.*","",gsub("\\(|\\[|\\)|\\]","", x)))
    upper <- as.numeric(gsub(".*,","",gsub("\\(|\\[|\\)|\\]","", x)))
    return(round(lower+(upper-lower)/2, dp))
}

df <- funnel_df[funnel_df$x1 < 20 & funnel_df$x1 > -20 & funnel_df$y < 9 & funnel_df$y > -9,]
x_c <- cut(df$x1, 20)
y_c <- cut(df$y, 20)
z <- table(x_c, y_c)
z_df <- as.data.frame(z)
a_df <- data.frame(x=midpoints(z_df$x_c),y=midpoints(z_df$y_c),f=z_df$Freq)

m = as.matrix(dcast(a_df,x ~ y))
## Using f as value column: use value.var to override.
hist3D(x=m[,"x"],y=as.double(colnames(m)[2:21]),z=(as.matrix(dcast(a_df,x ~ y)))[,2:21], border="black",ticktype = "detailed",theta=5,phi=20)
## Using f as value column: use value.var to override.

Since the analytic form of the distribution is known, one can apply a trick to correct this problem and then one is sampling from unit normals!

parameters {
  real y_raw;
  vector[9] x_raw;
}
transformed parameters {
  real y;
  vector[9] x;

  y <- 3.0 * y_raw;
  x <- exp(y/2) * x_raw;
}
model {
  y_raw ~ normal(0,1);
  x_raw ~ normal(0,1);
}

And now the neck of the funnel is explored.

funnel_fit <- stan(file='funnel_reparam.stan', cores=4, iter=10000)
funnel_samples <- extract(funnel_fit,permuted=TRUE,inc_warmup=FALSE);
funnel_df <- data.frame(x1=funnel_samples$x[,1],y=funnel_samples$y[])

df <- funnel_df[funnel_df$x1 < 20 & funnel_df$x1 > -20 & funnel_df$y < 9 & funnel_df$y > -9,]
x_c <- cut(df$x1, 20)
y_c <- cut(df$y, 20)
z <- table(x_c, y_c)
z_df <- as.data.frame(z)
a_df <- data.frame(x=midpoints(z_df$x_c),y=midpoints(z_df$y_c),f=z_df$Freq)

m = as.matrix(dcast(a_df,x ~ y))
## Using f as value column: use value.var to override.
hist3D(x=m[,"x"],y=as.double(colnames(m)[2:21]),z=(as.matrix(dcast(a_df,x ~ y)))[,2:21], border="black",ticktype = "detailed",theta=5,phi=20)
## Using f as value column: use value.var to override.

We’d expect the Haskell implementation to also fail to explore the neck. Maybe I will return to this after the article on why NUTS works.

Leave a comment